Development of Machine Learning based Natural Language Processing System
نویسندگان
چکیده
For disease diagnostic knowledge base system (including Q&A, consistency checker, and informativity checker) requiring higher degree of strictness on the results of a query to the system, at the development stage, great efforts are necessary to improve machine learning based statistical performance tests, measured by precision and recall rates, on the training error and prediction. Performance of the test runs is generally dependent on the two basic factors as the following: first of all, the underlying technique of in-depth context analysis on corpora with inference capability, secondly, that of user's query sentence analysis with inference capability. More importantly, a disease diagnostic knowledge base system should be able to update effectively the latest research achievements in timely manner. To meet the requirements, we propose an automatic system for construction of knowledge base from the academic archive of medical literatures. For the purpose of presentation in this paper, a prototype of knowledge base construction using natural language processing system for early diagnosis of Alzheimer disease has been designed and implemented. Since there are plenty of knowledge base systems available for Alzheimer diagnosis in English language, to differentiate our works with the existing data, we performed our research with the literatures written in Korean. The natural language processing system proposed in this paper consists of 8 modules most of which are machine learning trainer/prediction model based on maximum entropy algorithm. Tests showed that, for
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملارائه الگوریتمی مبتنی بر یادگیری جمعی به منظور یادگیری رتبهبندی در بازیابی اطلاعات
Learning to rank refers to machine learning techniques for training a model in a ranking task. Learning to rank has been shown to be useful in many applications of information retrieval, natural language processing, and data mining. Learning to rank can be described by two systems: a learning system and a ranking system. The learning system takes training data as input and constructs a ranking ...
متن کاملCorefrence resolution with deep learning in the Persian Labnguage
Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کامل